Hello dynr lab (and possibly a wider audience if it
comes to that but I have doubts)!
These are some snippets of some analyses I’ve conducted for the book chapter I’ve been working on with Sy-Miin and Peter written into a bloggy-kind of style. One thing I think we need more of on the QDev tutorials site is more context and personality. At times, tutorials can be written so directly that they can be hard to follow. So, I decided to take a stab at writing one in a more conversational fashion.
Broadly, the book chapter is about introducing fuzzy clustering to the behavioral sciences and making use of it in a way that isn’t traditionally seen in the literature. Before I can really delve into that though, I should probably give a very brief primer on what clustering in general before going off on the unique aspects of fuzziness.
In this section, I’ll describe the goals of community detection in a
broad sense. Generally, community detection is for identifying groups of
things that are more similar to each other than they are to
other things. We live in a world defined by groups and we
implicitly know that some things belong with each other while other
things don’t.
Take–for instance–vegetables and fruits. We can look at some identifying features of an apple and determine that apples are qualitatively different from a carrot. As we extend this comparison to more fruits and vegetables, we can begin to draw boundaries between which foods “belong together”. The first part of this tutorial will simply be that, looking for groups.
Why do we give a $h!+ if there are “communities” in our
data? Well, good question Rhetorical Jonathan! The reason why finding
communities in our data depends mainly on the kind of data we’re
constructing. With our earlier example on fruits and vegetables, we
might want to identify key characteristics that make a fruit a fruit and
a vegetable a vegetable. Things like, the presence of seeds or how
disgusting they are to have at dinner can be things we use to describe
the differences.
Of course, as humans we’re pretty good at drawing arbitrary boundaries between things and distinguishing groups. However, it’s another thing entirely to teach a computer how to understand the differences between things. Thus, we need to formulate those differences using .
Below, we introduce the concept of an adjacency matrix:
\[\mathbf{A} = \begin{bmatrix} a_{11} & a_{12} & a_{13} & a_{14} \\ a_{21} & a_{22} & a_{23} & a_{24} \\ a_{31} & a_{32} & a_{33} & a_{34} \\ a_{41} & a_{42} & a_{43} & a_{44} \end{bmatrix}\]
This is the heart and soul of community detection and our best friend in our endeavor to teach computers to find groups good [grammatical error intention for emphasis]. The adjacency matrix is a symmetric matrix where each cell, \(a_{ij};i \neq j\) indicates the similarity or connection strength between the \(i^{th}\) and \(j^{th}\) case. Typically, the diagonal elements will be fixed to \(0.00\).
This will then be fed into any number of community detection algorithms of varying sophistication and ultimately yield some sort of communities that we can evaluate!
So, let’s say we have an adjacency matrix… What’s a good community detection algorithm to start with? Well, \(k\)-means of course! Let’s introduce that concept and then apply it to some real data.
We can introduce \(k\)-means clustering as an algorithm designed to assign \(N\) data-points to \(k\) communities. We can describe the algorithm as follows:
Where the objective function that we try to minimize is the sum of squared distances between each subject to its assigned centroid across all clusters:
\[\mathop{\mathrm{arg\,min}}_{s} \sum_{i=1}^{k}\sum_{x\in S_{i}} \left\lVert x - \mu_i\right\rVert^2\] where \(k\) is the total number of clusters to partition, \(S\) is the \(i^{th}\) cluster, and \(\mu_i\) is the centroid of the \(i^{th}\) cluster. Simply put, \(k\)-means clustering attempts to identify partitions of variables where the within-cluster variance is minimized relative to the cluster-specific centroid. Simple as that!
So, let’s give it a shot with some real data.
As I noted before, the information we put into our adjacency matrix really determines how we will interpret the results that we get out of a community detection search. In this case, we’ll be looking at mean-level data from a sample of \(N = 54\) subjects where 27 had a clinical diagnosis of Major Depressive Disorder and 27 were pair-matched controls.
In this section, we’ll be exploring the De Vos et al. (2017) dataset. The dataset contains \(N = 54\) subjects who either have Major Depressive Disorder (MDD) or are functionally pair-matched controls (\(n_{dep} = 27\); \(n_{con} = 27\)). In the first set of analyses, we will be assessing for cluster-based differences in the mean-structures of all subjects and asking the question of whether cluster assignment aligns with clinical diagnoses. Essentially, we are searching for whether there are clusters of individuals based on their within-subject means on 14 affect related variables.
We’ll be using the fclust package and–specifically–the
FKM() function from that package to conduct our analyses
for the sake of future arguments down the road. We’ll begin by reading
in the data:
tdat = read.csv('./Datasets/raw.csv', sep = '',skip = 1, header = F)
ID=rep(1:54, each = 90)
tdat = cbind(tdat[,2:16], ID)
vars = c("talkative","energetic","tense","anxious","enthusiastic",
"confident","distracted","restless","irritated","satisfied",
"happy","depressed","cheerful","guilty")
names(tdat) = c('Group', vars, 'ID')
Now that the data are read in, we can select our variables and enumerate out some groups. Since we know that there are 2 clinical groups, why don’t we try to extract 2 groups and then extract a third group for the sake of comparison:
The plots above show us that the group placements really begin to differ in that central area where depression and happiness are at the middle of the scale. Where, the third group essentially becomes a majority of those placed in the central area.
So, how might we go about deciding which enumeration to use? 2-groups
or 3? Well, the FKM package does have some metrics for
assessing the quality of our partitions:
# Cluster Quality Indices
round(Fclust.index(clust11),2); round(Fclust.index(clust12),2)
## The default value alpha=1 has been set for computing SIL.F
## PC PE MPC SIL SIL.F XB
## 1.00 0.00 1.00 0.57 0.57 0.25
## The default value alpha=1 has been set for computing SIL.F
## PC PE MPC SIL SIL.F XB
## 1.00 0.00 1.00 0.58 0.58 0.33
While I could go into detail regarding what each of these means, I’ll avoid it for now. Just know that all of the values are essentially equivalent with very minor differences across them. Thus, without a clear sign of certainty, we would go with the more parsimonious option; a 2-group solution.
Now, let’s look at our groups:
From the above regularized correlation network, we can see that the first group–which is primarily composed of our depressed people is sparser in the inter-affect space relative to our second group [this can be judged both by the boldness of the lines but also the fact that the repulsion used to generate the placement of nodes in the graph results in them being further apart].
In addition, these groups differ quantitatively in some features such as their mean-levels of depression and happiness–for example:
# t-test on Depression
t.test(means[clust11$clus[,1]==1,12],means[clust11$clus[,1]==2,12])
##
## Welch Two Sample t-test
##
## data: means[clust11$clus[, 1] == 1, 12] and means[clust11$clus[, 1] == 2, 12]
## t = 6.9212, df = 25.54, p-value = 2.635e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.646636 3.039621
## sample estimates:
## mean of x mean of y
## 3.846827 1.503698
# t-test on Happiness
t.test(means[clust11$clus[,1]==1,11],means[clust11$clus[,1]==2,11])
##
## Welch Two Sample t-test
##
## data: means[clust11$clus[, 1] == 1, 11] and means[clust11$clus[, 1] == 2, 11]
## t = -6.3747, df = 28.393, p-value = 6.305e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.348620 -1.206846
## sample estimates:
## mean of x mean of y
## 2.972703 4.750436
Now, that’s all fine and good but I bet you’re wondering how well these match up with clinical diagnoses. Well, we can check that!
Fclust.compare(c(rep(1,27),rep(2,27)), clust11$U)
## ARI.F RI.F JACCARD.F
## 0.3854101 0.6925227 0.5319149
The adjusted Rand index–or ARI–is a measure of how well we classified people given that we know the true membership. An ARI above 0.75 would be ideal but here, we get a value of 0.39 which isn’t the best but what does that really mean?
Well, in looking at the raw numbers, it means that we classified about 10-people incorrectly relative to the true diagnoses. Mainly, those subjects being people in the depressed group being incorrectly labeled as controls.
Now, the title of this section is “Hard” mean-level clustering. Does that mean that there’s an alternative version we could use? Indeed there is. It’s the fuzzy clustering approach! Let’s dig into that now.
Addressing the question of: what is “fuzziness” would probably be a good place to start. Fundamentally, fuzziness is a concept pertaining to the idea of uncertainty around a boundary. In the case of community detection, it’s about allowing for some ambiguity around the edges of our clusters to acknowledge that things don’t always fall into neat little bins.
Let’s go back to the original example where we talk about fruits and vegetables. While an apple is clearly a fruit and a carrot is a vegetable, you’ll still see people at each other’s throats when it comes to thinks like cucumbers and tomatoes being fruit or vegetable. The reason for this is something that isn’t worth our time but it does highlight the need for methods that acknowledge that there are some things that are so close to the boundaries that we may want to allow for more wiggle room in our analyses.
So, how is fuzzy clustering different than hard clustering? In the \(k\)-means case, we evolve into fuzzy \(c\)-means where:
\[\mathop{\mathrm{arg\,min}}_{s} \sum_{i=1}^{n}\sum_{j=1}^{s} w_{ij}^m\left\lVert x_{i} - s_j\right\rVert^2\] where \(m\) is the fuzzifier–\(m \in \mathbb{R}\) where \(m \geq 1\)–which determines the degree of fuzziness in the cluster solution, \(w_{ij}^{m}\) is the degree to which the \(i^{th}\) element of \(x\) belongs to the \(j^{th}\) cluster. Formally, \(w_{ij}^{m}\) may be calculated as:
\[w_{ij} = \frac{1}{\sum_{k=1}^{s}\left(\frac{\left\lVert x_{i} - s_{j}\right\rVert}{\left\lVert x_{i} - s_{k}\right\rVert}\right)^{\frac{2}{m-1}}}\]
The calculation of the centroids must also be reworked due to the introduction of fuzziness. In particular, the centroids must be weighted by the degree of membership and can be expressed as:
\[C_{s} = \frac{\sum_{x}w_{s}(x)^{m}x}{\sum_{x}w_{s}(x)^m}\]
where the centroid of the \(s^{th}\) cluster is the mean of all datapoints weighted by their degree of membership within that specific cluster.
So, essentially, the only real difference is that we include a weight in the objective function that determines how “fuzzy” we will allow our clusters to be. Simple as that!
In the physics literature, fuzziness is a way of making the centroids more robust to ourliers in the data and that could certainly be useful here in the behavioral sciences; however, there may be some additional uses that we’re not quite privy to yet. Let’s check it out and see by running FCM on the de Vos data.
Below, we analyze the De Vos data but this time we use fuzzy clustering. In addition, I took the liberty of emphasizing data-points that do not firmly belong to one group or another (black points). In the two dimensions that we see–of the 14 possible–we see that the ambiguous subjects tend to be in the middle of depressed and happy on the continuum.
So, how might we go about deciding which enumeration to use? 2-groups
or 3? Well, the FKM package does have some metrics for
assessing the quality of our partitions:
# Cluster Quality Indices
round(Fclust.index(clust21),2); round(Fclust.index(clust22),2)
## The default value alpha=1 has been set for computing SIL.F
## PC PE MPC SIL SIL.F XB
## 0.75 0.40 0.50 0.57 0.72 0.17
## The default value alpha=1 has been set for computing SIL.F
## PC PE MPC SIL SIL.F XB
## 0.68 0.58 0.51 0.58 0.65 0.24
In the quality indices above, we actually see that the fuzzy 2-cluster solution is better than the fuzzy 3-cluster solution with higher PC and SIL.F values and a lower PE and XB value. Thus, we would lean towards preferring this enumeration but how does it perform?
Well, looking at raw numbers, we put 5-subjects into the wrong diagnostic group compared to the hard clustering approach: 1 control into depressed and 4 depressed into the control group. This is a 50% reduction in error!
On the flip side, we did exclude 15 people from being clustered. What’s with that? Don’t they have meaningful data? Well, yes. They absolutely do. Let’s describe the “fuzzy” people!
In sum, all but 2 fuzzy people are subjects with clinical diagnoses of major depressive disorder and are undergoing out-patient care. Thus, if things were working, the reason why these people don’t match well with the purely depressed group or the purely healthy group is that they are somewhere in the middle.
Let’s check out their depression levels:
t.test(means[c(1,2,3,6,7,9,16,18,20,21,22,24,26,28,39),12],
means[c(4,8,12:15,19,23,25,27,29),12])
##
## Welch Two Sample t-test
##
## data: means[c(1, 2, 3, 6, 7, 9, 16, 18, 20, 21, 22, 24, 26, 28, 39), 12] and means[c(4, 8, 12:15, 19, 23, 25, 27, 29), 12]
## t = -6.0385, df = 15.678, p-value = 1.879e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.074615 -1.474804
## sample estimates:
## mean of x mean of y
## 2.607022 4.881731
t.test(means[c(1,2,3,6,7,9,16,18,20,21,22,24,26,28,39),12],
means[c(5,10,11,17,30:39,40:54),12])
##
## Welch Two Sample t-test
##
## data: means[c(1, 2, 3, 6, 7, 9, 16, 18, 20, 21, 22, 24, 26, 28, 39), 12] and means[c(5, 10, 11, 17, 30:39, 40:54), 12]
## t = 6.1438, df = 23.934, p-value = 2.424e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.8441697 1.6984372
## sample estimates:
## mean of x mean of y
## 2.607022 1.335719
As expected, we see that the fuzzy people–despite their diagnoses–are statistically higher in depression compared to the solidly depressed group yet still appreciably lower than the controls. Nifty.
Ultimately, the fuzziness allows us to be more certain of the people who are within the groups at the cost of excluding people should we choose to threshold them out.
Now, all of the discussion to this point has been on the means but we also care about the dynamics; that is, how do people differ in how they prototypically change over time?
First, we’ll start off with the problem we always need to solve when doing community detection:
How do we define our adjacency matrix?
Given that \(\mathbf{A}\) is a symmetric matrix that defines how similar two things are, there is a host of ways for us to take the values from a dynamic model to define “similarity”. Take–for instance–that we decide to fit a vector autoregressive (VAR) model to our data in the form:
\[\mathbf{\eta}_{(t)} = \mathbf{\mu} + \mathbf{\Phi}\mathbf{\eta}_{(t-1)} + \mathbf{\zeta}_{t}\]
where \(\mathbf{\eta}\) is a p-variate vector of scores, \(\mathbf{\mu}\) is a p-variate vector of intercepts, \(\mathbf{\Phi}\) is a \(p\times p\) lagged regression matrix, and \(\mathbf{\zeta}\) is a p-variate residual vector assumed to have zero means and covariance matrix, \(\mathbf{\Psi}\).
Several parameters in the VAR(1) equation could be of use when attempting to identify unique subgroups of individuals. Subjects–as in the previous illustrations–could differ in their mean-levels across variables. In which case, \(\mathbf{\mu}\) may be of interest to us. In addition, if one is interested in the dynamics and finding groups of individuals who change similarly to one another through time, the \(\mathbf{\Phi}\) matrix may be worth investigating. Other researchers have also leveraged the residual covariance matrices or otherwise extracted information from them (i.e., graphical VAR) to further model contemporaneous effects.
However, in addition to leveraging different elements of the VAR(1)
model, we must also consider how we transform the information contained
in the coefficient matrices to our definition of “distance” in the
adjacency matrix. We can refer to past work on clustering VAR-based
models for how they operationally defined their adjacency matrices. For
instance, the subgrouping option in the group iterative multiple model
estimation (S-GIMME) procedure counts the number of raw coefficients
that are similar between any pair of subjects. If Subject X
and Subject Y both share a connection, \(\phi_{13}\) that is both statistically
significant and identical in sign (i.e., \(\pm\)), then it is added into the cell,
\(a_{xy}\), of the adjacency matrix.
Other methods like the subgrouping option for chain graphical VAR
(scgVAR) models count all similar paths including if a pair of subjects
share a common null path (i.e., \(\phi_{13} =
0.00\)). In a separate domain of work by Bulteel and colleagues
(2016), VAR models have also been clustered based on the Euclidean
distance of the elements of the \(\mathbf{\Phi}\) between any pair of
subjects. Thus, while some algorithms prioritize commonality in the raw
structure, others prioritize the relative distance between coefficient
matrices themselves. These nuanced differences may lead to various
effects on the clustered solution.
For instance, the ALS VAR–in an effort to maximally separate the generated clusters–tends to avoid allowing subgroups to share common paths with one another. In contrast, the S-GIMME and scgVAR approaches do not have this tendency. In contrast, the ALS VAR is able to detect differences between groups based on effect sizes (i.e., \(\phi_{13} = 0.21\) vs \(\phi_{13} = 0.63\)). This then leads down an entire rabbit hole on whether we think difference in magnitude constitute differences between groups that I will not get involved in. This discussion also completely leaves out that some algorithms (e.g., S-GIMME) leverage additional information beyond the simple \(\mathbf{\Phi}\) matrices by making use of contemporaneous effects as well when constructing the adjacency matrix.
The role of “fuzziness” in the clustering of VAR-based models then becomes an interesting issue. What makes a subject “fuzzy”? When constructing adjacency matrices in the fashion similar to S-GIMME or the scgVAR models, it may be a subject who has many structural similarities to multiple groups and could be indicative of a subject that is functionally between two “states”; if we define a state as a common (sub)group model. In the following illustrations, we will use the raw counts in a manner similar to S-GIMME by doing the following steps:
Below, we will cluster the VAR(1) models from the De Vos (2017) dataset using the method described above. In the code chunk below, we read in the imputed dataset, trim off some cases that have had convergence issues in the past and fit the VAR(1) model to each subject in an iterative fashion. Following this, we save out the \(\mathbf{\Phi}\) matrices and remove all non-significant paths resulting in sparse regression matrices for each subject and we can then proceed to the creation of our adjacency matrix.
data1 = read.csv('./Datasets/imputed.csv')
data1 = data1[!data1$id == 22,]
data1 = data1[!data1$id == unique(data1$id)[32],]
data1 = data1[!data1$id == unique(data1$id)[32],]
data1 = data1[!data1$id == unique(data1$id)[38],]
mod.list = list()
lag.mat = list()
mod.covs = list()
for(i in unique(data1$id)){
tmpdat = data1[data1$id == i,3:16]
mod.list[[i]] = VAR(tmpdat, p = 1, type = 'none')
lagmat = NULL
for(j in 1:14){
lagmat = rbind(lagmat, mod.list[[i]]$varresult[[j]][[1]])
}
lag.mat[[i]] = lagmat
mod.covs[[i]] = summary(mod.list[[i]])$covres
}
clag.mat = list(); cmod.covs = list()
cmod.list = list(); covmats = list()
for(i in 1:50){
cmod.list[[i]] = mod.list[[unique(data1$id)[i]]]
clag.mat[[i]] = lag.mat[[unique(data1$id)[i]]]
cmod.covs[[i]] = mod.covs[[unique(data1$id)[i]]]
}
mods=list()
for(j in 1:50){
tmp = summary(cmod.list[[j]])
res = NULL
for(k in 1:14){
tmp1 = tmp$varresult[[k]][4][[1]][,c(1,4)]
test = matrix(NA, nrow(tmp1), 1)
for(i in 1:nrow(tmp1)){
test[i,] = ifelse(tmp1[i,2] < 0.05, tmp1[i,1], 0)
}
res = rbind(res, c(test))
}
mods[[j]] = res
covmat = matrix(solve(c(diag(1,14^2)) - kronecker(res,res)) %*%
c(cmod.covs[[j]]), 14, 14)
covmats[[j]] = buildSSCP(sample.cov = covmat,
sample.mean = rep(0,14),
sample.nobs = 90,
namesofmat = paste0('V', 1:14))
}
In the code chunk below, we create our adjacency matrix based on the significant VAR(1) \(\mathbf{\Phi}\) coefficients that are common in sign and non-zero to match the S-GIMME construction method. Once the adjacency matrix is created, we set the diagonal elements to zero as is standard practice in community detection applications.
adjmat = matrix(NA, 50, 50)
adjmat.var = matrix(NA, 50, 50)
for(i in 1:50){
for(j in 1:50){
adjmat.var[i,j] = sum(sign(c(mods[[i]])) == sign(c(mods[[j]])) &
abs(c(mods[[i]])) > 0 & abs(c(mods[[j]])) > 0)
# adjmat.var[i,j] = sum(sign(c(clag.mat[[i]])) == sign(c(clag.mat[[j]])) &
# abs(c(clag.mat[[i]])) > 0 & abs(c(clag.mat[[j]])) > 0)
adjmat[i,j] = sum(sign(c(covmats[[i]])) == sign(c(covmats[[j]])) &
abs(c(covmats[[i]])) > 0 & abs(c(covmats[[j]])) > 0)
}
}
diags = NULL
for(i in 1:50){
diags = rbind(diags, diag(clag.mat[[i]]))
}
diag(adjmat.var) = 0
diag(adjmat) = 0
Now, we can apply the FKM() function onto the
constructed adjacency matrix in both the hard- and fuzzy-clustering
approaches. In the code chunk below, we apply the community detection
approaches for a 2-group solution and print out the first 5-membership
degrees. Recall, the membership degrees tell us how strongly one subject
belongs to one subgroup or another and roughly sums to 1 across the
number of enumerated subgroups.
When we look at the hard 2-subgroup solution, we see that it looks promising! Of the first 5 subjects, all but one is classified with the others. Considering that the first 26 subjects are clinically depressed this seems like a good start.
round(FKM(adjmat.var,2,1.01)$U[1:5,], 2)
## Clus 1 Clus 2
## Obj 1 1 0
## Obj 2 1 0
## Obj 3 0 1
## Obj 4 1 0
## Obj 5 1 0
Out of curiosity’s sake, let’s expand this out to all of the depressed subjects which we do below. On further inspection, 11 subjects are classified into group 1 and 15 are classified in group 2. A much wider split that the first 5-cases would have led us to believe.
round(FKM(adjmat.var,2,1.01)$U[1:26,], 2)
## Clus 1 Clus 2
## Obj 1 1 0
## Obj 2 1 0
## Obj 3 0 1
## Obj 4 1 0
## Obj 5 1 0
## Obj 6 0 1
## Obj 7 1 0
## Obj 8 1 0
## Obj 9 0 1
## Obj 10 1 0
## Obj 11 0 1
## Obj 12 0 1
## Obj 13 0 1
## Obj 14 1 0
## Obj 15 0 1
## Obj 16 0 1
## Obj 17 0 1
## Obj 18 1 0
## Obj 19 0 1
## Obj 20 1 0
## Obj 21 0 1
## Obj 22 0 1
## Obj 23 0 1
## Obj 24 0 1
## Obj 25 1 0
## Obj 26 0 1
Now, for illustration, we can try fuzzy clustering on the same data. The result below suggests that the values are 0.5 in either direction for the membership degrees. This indicates that our algorithm is not confident in the placement of any one individual to any given cluster. Thus, we can likely infer that the hard clustering solution is mostly noise.
FKM(adjmat.var,2,2)$U[1:5,]
## Clus 1 Clus 2
## Obj 1 0.5 0.5
## Obj 2 0.5 0.5
## Obj 3 0.5 0.5
## Obj 4 0.5 0.5
## Obj 5 0.5 0.5
Let’s try visualizing the subject-by-subject network and apply coloring based on the cluster solutions in the code chunk below. What we see is that the hard clustering solution essentially creates concentric subgroups where the inner, green subgroup is surrounded by the red subgroup. Moreover, the similarities between–for example–Subject 39 and Subject 14 are 0. Thus, they shouldn’t really even be in a community together at all.
par(mfrow = c(2,1))
cols1 = FKM(adjmat.var,2,1.01)$clus[,1]
cols2 = apply(FKM(adjmat.var,2,2)$U, 1, max) < 0.70
cols2 = ifelse(as.numeric(cols2), 'gray', 'white')
qgraph(adjmat.var, layout = 'spring', color = cols1, title = 'K-Means')
qgraph(adjmat.var, layout = 'spring', color = cols2, title = 'Fuzzy C-Means')
Lets take a look at the 2 subgroup solution brought to use by the k-means algorithm by fitting a group-level model within each subgroup which we do below. Overall, the groups do look distinctly different with one model characterized by many significantly strong connections in contrast to the other which is much weaker but stronger in the autoregressive terms for certain variables such as depression, guilt, and anxiety. All certainly stronger than the first subgroup pictured on top.
But if we were to look into a specific subset of subjects within that first group, we may find something interesting.
s1 = unique(data1$id)[cols1==1]
s2 = unique(data1$id)[!cols1==1]
tmpdat = data1[data1$id %in% s1, 3:16]
mod1 = VAR(tmpdat, p = 1, type = 'none')
lagmat = NULL
for(j in 1:14){
lagmat = rbind(lagmat, mod1$varresult[[j]][[1]])
}
lagmat = rbind(lagmat, mod1$varresult[[j]][[1]])
tmp = summary(mod1)
res = NULL
for(k in 1:14){
tmp1 = tmp$varresult[[k]][4][[1]][,c(1,4)]
test = matrix(NA, nrow(tmp1), 1)
for(i in 1:nrow(tmp1)){
test[i,] = ifelse(tmp1[i,2] < 0.05, tmp1[i,1], 0)
}
res = rbind(res, c(test))
}
res1=res
tmpdat = data1[data1$id %in% s2, 3:16]
mod1 = VAR(tmpdat, p = 1, type = 'none')
lagmat = NULL
for(j in 1:14){
lagmat = rbind(lagmat, mod1$varresult[[j]][[1]])
}
lagmat = rbind(lagmat, mod1$varresult[[j]][[1]])
tmp = summary(mod1)
res = NULL
for(k in 1:14){
tmp1 = tmp$varresult[[k]][4][[1]][,c(1,4)]
test = matrix(NA, nrow(tmp1), 1)
for(i in 1:nrow(tmp1)){
test[i,] = ifelse(tmp1[i,2] < 0.05, tmp1[i,1], 0)
}
res = rbind(res, c(test))
}
par(mfrow = c(2,1))
lay = averageLayout(list(qgraph(res1),qgraph(res)))
qgraph(res1, layout = lay, theme = 'colorblind', edge.labels = TRUE, labels = vars, maximum = 0.5)
qgraph(res, layout = lay, theme = 'colorblind', edge.labels = TRUE, labels = vars, maximum = 0.5)
I had noted earlier that several subjects in the group above were not even remotely connected to each other in the adjacency matrix. If we want to consider the fact that subjects within the outer subgroup have literally nothing in common, we can highlight that by showing a sample of 2 subjects. We see that not only do these subject’s models widely differ from the subgroup that they’re in. They also significantly differ from each other. This wide degree of heterogeneity within the sample was highlighted by De Vos and colleagues (2017).
par(mfrow = c(2,1))
lay = averageLayout(list(qgraph(mods[[14]]),qgraph(mods[[39]])))
qgraph(mods[[14]], layout = lay, theme = 'colorblind', edge.labels = TRUE, labels = vars, fade = TRUE)
qgraph(mods[[39]], layout = lay, theme = 'colorblind', edge.labels = TRUE, labels = vars, fade = TRUE)
We can repeat this process by looking at various metrics besides the \(\mathbf{\Phi}\) matrices. We can try creating an adjacency matrix using the model implied covariance matrices which we do below. In this case, we do not particularly find any strong group separations.
FKM(adjmat,2,2)$U[1:3,]
## Clus 1 Clus 2
## Obj 1 0.5 0.5
## Obj 2 0.5 0.5
## Obj 3 0.5 0.5
FKM(adjmat,3,2)$U[1:3,]
## Clus 1 Clus 2 Clus 3
## Obj 1 0.3333333 0.3333333 0.3333333
## Obj 2 0.3333333 0.3333333 0.3333333
## Obj 3 0.3333333 0.3333333 0.3333333
FKM(adjmat,4,2)$U[1:3,]
## Clus 1 Clus 2 Clus 3 Clus 4
## Obj 1 0.25 0.25 0.25 0.25
## Obj 2 0.25 0.25 0.25 0.25
## Obj 3 0.25 0.25 0.25 0.25
Across the board when enumerating 2 through 7 groups, there isn’t really a clear distinction between subjects into groups. Does this mean that dynamics are worthless? Nope! In fact, finding no groups is likely a positive sign in the presence of widespread heterogeneity in dynamics. What if we tried looking only at the AR effects?
FKM(diags,2,2)$U[1:3,]
## Clus 1 Clus 2
## Obj 1 0.5 0.5
## Obj 2 0.5 0.5
## Obj 3 0.5 0.5
FKM(diags,3,2)$U[1:3,]
## Clus 1 Clus 2 Clus 3
## Obj 1 0.3333333 0.3333333 0.3333333
## Obj 2 0.3333333 0.3333333 0.3333333
## Obj 3 0.3333333 0.3333333 0.3333333
FKM(diags,4,2)$U[1:3,]
## Clus 1 Clus 2 Clus 3 Clus 4
## Obj 1 0.25 0.25 0.25 0.25
## Obj 2 0.25 0.25 0.25 0.25
## Obj 3 0.25 0.25 0.25 0.25
What about select AR effects? Namely, only
looking at the AR-terms for happiness and
depression:
From the plot here, we see that the separation is based primarily on
the AR term associated with happiness and less so from
depression and–on personal checking–there isn’t really an association
with clinical memberships but it is interesting that the data are
divided in the finely-tuned ways! This isn’t a
suggestion that we dig through out data and try to find any set of
clusters but rather to highlight that careful consideration needs to go
into the content of the adjacency matrix.
Ultimately, the decision of what we put into our adjacency matrices will determine the quality of our partitions and whether our partitions are meaningful. If theory guides us towards one avenue, we should pursue it.
So far, we’ve seen that subjects can be clustered in a hard fashion as well as in a fuzzy fashion on their means and their dynamics with ultimate consideration coming down to theory.
However, does that mean that clustering on dynamics is worthless? No, absolutely not. De Vos and colleagues (2017) explicitly noted that their subjects were incredibly heterogeneous in their dynamics. In fact, it should be a good thing that we similarly don’t find clear separations between groups as it reaffirms their work.
But what if we did have a clear separation between groups? Maybe in simulation land, we could see how well our algorithms perform..
Below, I simulated \(N = 30\) subjects in Simulation Land. 10-subjects are defined by one VAR model and 10-subjects are defined by another. While the final 10 are a mixture of the two distinct groups.
The groups are separated by 9 distinct cross-regressions and had 500 simulated time-points each.
After fitting the VAR models to each subject using the
VAR() function in the vars package, we then
save the resulting \(\mathbf{\Phi}\)
matrices and construct an adjacency matrix.
Upon searching it with hard and soft clustering, we find the following:
Essentially, even though we have explicitly fuzzy subjects, the hard clustering algorithm forces them into one set or another. In contrast, a hard 3-group solution separates out the fuzzy subjects into a distinct group, and our fuzzy analysis acknowledges them as uncertainly placed.
If we compare the hard cluster solutions (2 or 3 groups), we actually find that the 2 group solution scores better than the 3 group solution:
round(Fclust.index(clust11),2)
## The default value alpha=1 has been set for computing SIL.F
## PC PE MPC SIL SIL.F XB
## 1.00 0.00 1.00 0.63 0.63 0.20
round(Fclust.index(clust12),2)
## The default value alpha=1 has been set for computing SIL.F
## PC PE MPC SIL SIL.F XB
## 1.00 0.00 1.00 0.57 0.57 0.36
Now, what’s wrong with accidentally placing a fuzzy person with the wrong group? Well, it fudges our parameters. Let’s look back at the 2-group solution for the hard clustering VAR. We see that the algorithm says that subjects 11 through 30 belong together.
So, what’s the bias introduced by incorporating those additional 10 wrong subjects into the correct second model?
| Correct | Incorrect | |
|---|---|---|
| Mean | 0.013 | 0.028 |
| SD | 0.009 | 0.017 |
The absolute bias is shown above in just a single replication and we see that–in general–the absolute bias is 2x in the wrong model with greater SD as well. In terms of relative bias, both models do well in recovering the true AR coefficients because… well… everyone has those. But in the cross-regression parameters, inclusion of the wrong subjects results in a significant increase in the relative biases going as high as -0.29 compared to 0.09 in the correct grouping scenario.
Other ways of calculating influence? We’ve been trying to incorporate the multivariate Cook’s Distance as well as using the innovative outliers functionality in dynr [not included because that would take 18 years to run and compile] and have not necessarily found immense success in highlight the effects a wrong person may have on the sample in a way that is immediately obvious and clear.